Serveur d'exploration MERS

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Estimating evolutionary distances between genomic sequences from spaced-word matches.

Identifieur interne : 000087 ( France/Analysis ); précédent : 000086; suivant : 000088

Estimating evolutionary distances between genomic sequences from spaced-word matches.

Auteurs : Burkhard Morgenstern [France] ; Bingyao Zhu [Allemagne] ; Sebastian Horwege [Allemagne] ; Chris André Leimeister [Allemagne]

Source :

RBID : pubmed:25685176

Abstract

Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d N of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of 'match positions' and 'don't care positions'. Our software is available online and as downloadable source code at: http://spaced.gobics.de/.

DOI: 10.1186/s13015-015-0032-x
PubMed: 25685176


Affiliations:


Links toward previous steps (curation, corpus...)


Links to Exploration step

pubmed:25685176

Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en">Estimating evolutionary distances between genomic sequences from spaced-word matches.</title>
<author>
<name sortKey="Morgenstern, Burkhard" sort="Morgenstern, Burkhard" uniqKey="Morgenstern B" first="Burkhard" last="Morgenstern">Burkhard Morgenstern</name>
<affiliation wicri:level="3">
<nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany ; Université d'Evry Val d'Essonne, Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA 23 Boulevard de France, Evry, 91037 France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany ; Université d'Evry Val d'Essonne, Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA 23 Boulevard de France, Evry</wicri:regionArea>
<placeName>
<region type="region">Île-de-France</region>
<region type="old region">Île-de-France</region>
<settlement type="city">Évry (Essonne)</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Zhu, Bingyao" sort="Zhu, Bingyao" uniqKey="Zhu B" first="Bingyao" last="Zhu">Bingyao Zhu</name>
<affiliation wicri:level="3">
<nlm:affiliation>University of Göttingen, Department of General Microbiology, Grisebachstr. 8, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of General Microbiology, Grisebachstr. 8, Göttingen</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Horwege, Sebastian" sort="Horwege, Sebastian" uniqKey="Horwege S" first="Sebastian" last="Horwege">Sebastian Horwege</name>
<affiliation wicri:level="3">
<nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Leimeister, Chris Andre" sort="Leimeister, Chris Andre" uniqKey="Leimeister C" first="Chris André" last="Leimeister">Chris André Leimeister</name>
<affiliation wicri:level="3">
<nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">PubMed</idno>
<date when="2015">2015</date>
<idno type="RBID">pubmed:25685176</idno>
<idno type="pmid">25685176</idno>
<idno type="doi">10.1186/s13015-015-0032-x</idno>
<idno type="wicri:Area/PubMed/Corpus">001700</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Corpus" wicri:corpus="PubMed">001700</idno>
<idno type="wicri:Area/PubMed/Curation">001700</idno>
<idno type="wicri:explorRef" wicri:stream="PubMed" wicri:step="Curation">001700</idno>
<idno type="wicri:Area/PubMed/Checkpoint">001559</idno>
<idno type="wicri:explorRef" wicri:stream="Checkpoint" wicri:step="PubMed">001559</idno>
<idno type="wicri:Area/Ncbi/Merge">001041</idno>
<idno type="wicri:Area/Ncbi/Curation">001041</idno>
<idno type="wicri:Area/Ncbi/Checkpoint">001041</idno>
<idno type="wicri:doubleKey">1748-7188:2015:Morgenstern B:estimating:evolutionary:distances</idno>
<idno type="wicri:Area/Main/Merge">001809</idno>
<idno type="wicri:Area/Main/Curation">001804</idno>
<idno type="wicri:Area/Main/Exploration">001804</idno>
<idno type="wicri:Area/France/Extraction">000087</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en">Estimating evolutionary distances between genomic sequences from spaced-word matches.</title>
<author>
<name sortKey="Morgenstern, Burkhard" sort="Morgenstern, Burkhard" uniqKey="Morgenstern B" first="Burkhard" last="Morgenstern">Burkhard Morgenstern</name>
<affiliation wicri:level="3">
<nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany ; Université d'Evry Val d'Essonne, Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA 23 Boulevard de France, Evry, 91037 France.</nlm:affiliation>
<country xml:lang="fr">France</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany ; Université d'Evry Val d'Essonne, Laboratoire Statistique et Génome, UMR CNRS 8071, USC INRA 23 Boulevard de France, Evry</wicri:regionArea>
<placeName>
<region type="region">Île-de-France</region>
<region type="old region">Île-de-France</region>
<settlement type="city">Évry (Essonne)</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Zhu, Bingyao" sort="Zhu, Bingyao" uniqKey="Zhu B" first="Bingyao" last="Zhu">Bingyao Zhu</name>
<affiliation wicri:level="3">
<nlm:affiliation>University of Göttingen, Department of General Microbiology, Grisebachstr. 8, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of General Microbiology, Grisebachstr. 8, Göttingen</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Horwege, Sebastian" sort="Horwege, Sebastian" uniqKey="Horwege S" first="Sebastian" last="Horwege">Sebastian Horwege</name>
<affiliation wicri:level="3">
<nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Leimeister, Chris Andre" sort="Leimeister, Chris Andre" uniqKey="Leimeister C" first="Chris André" last="Leimeister">Chris André Leimeister</name>
<affiliation wicri:level="3">
<nlm:affiliation>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen, 37073 Germany.</nlm:affiliation>
<country xml:lang="fr">Allemagne</country>
<wicri:regionArea>University of Göttingen, Department of Bioinformatics, Goldschmidtstr. 1, Göttingen</wicri:regionArea>
<placeName>
<region type="land" nuts="2">Basse-Saxe</region>
<settlement type="city">Göttingen</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j">Algorithms for molecular biology : AMB</title>
<idno type="ISSN">1748-7188</idno>
<imprint>
<date when="2015" type="published">2015</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
</fileDesc>
<profileDesc>
<textClass></textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">Alignment-free methods are increasingly used to calculate evolutionary distances between DNA and protein sequences as a basis of phylogeny reconstruction. Most of these methods, however, use heuristic distance functions that are not based on any explicit model of molecular evolution. Herein, we propose a simple estimator d N of the evolutionary distance between two DNA sequences that is calculated from the number N of (spaced) word matches between them. We show that this distance function is more accurate than other distance measures that are used by alignment-free methods. In addition, we calculate the variance of the normalized number N of (spaced) word matches. We show that the variance of N is smaller for spaced words than for contiguous words, and that the variance is further reduced if our spaced-words approach is used with multiple patterns of 'match positions' and 'don't care positions'. Our software is available online and as downloadable source code at: http://spaced.gobics.de/. </div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Allemagne</li>
<li>France</li>
</country>
<region>
<li>Basse-Saxe</li>
<li>Île-de-France</li>
</region>
<settlement>
<li>Göttingen</li>
<li>Évry (Essonne)</li>
</settlement>
</list>
<tree>
<country name="France">
<region name="Île-de-France">
<name sortKey="Morgenstern, Burkhard" sort="Morgenstern, Burkhard" uniqKey="Morgenstern B" first="Burkhard" last="Morgenstern">Burkhard Morgenstern</name>
</region>
</country>
<country name="Allemagne">
<region name="Basse-Saxe">
<name sortKey="Zhu, Bingyao" sort="Zhu, Bingyao" uniqKey="Zhu B" first="Bingyao" last="Zhu">Bingyao Zhu</name>
</region>
<name sortKey="Horwege, Sebastian" sort="Horwege, Sebastian" uniqKey="Horwege S" first="Sebastian" last="Horwege">Sebastian Horwege</name>
<name sortKey="Leimeister, Chris Andre" sort="Leimeister, Chris Andre" uniqKey="Leimeister C" first="Chris André" last="Leimeister">Chris André Leimeister</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Sante/explor/MersV1/Data/France/Analysis
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000087 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/France/Analysis/biblio.hfd -nk 000087 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Sante
   |area=    MersV1
   |flux=    France
   |étape=   Analysis
   |type=    RBID
   |clé=     pubmed:25685176
   |texte=   Estimating evolutionary distances between genomic sequences from spaced-word matches.
}}

Pour générer des pages wiki

HfdIndexSelect -h $EXPLOR_AREA/Data/France/Analysis/RBID.i   -Sk "pubmed:25685176" \
       | HfdSelect -Kh $EXPLOR_AREA/Data/France/Analysis/biblio.hfd   \
       | NlmPubMed2Wicri -a MersV1 

Wicri

This area was generated with Dilib version V0.6.33.
Data generation: Mon Apr 20 23:26:43 2020. Site generation: Sat Mar 27 09:06:09 2021